External Sandhi and its Relevance to Syntactic Treebanking
نویسندگان
چکیده
External sandhi is a linguistic phenomenon which refers to a set of sound changes that occur at word boundaries. These changes are similar to phonological processes such as assimilation and fusion when they apply at the level of prosody, such as in connected speech. External sandhi formation can be orthographically reflected in some languages. External sandhi formation in such languages, causes the occurrence of forms which are morphologically unanalyzable, thus posing a problem for all kind of NLP applications. In this paper, we discuss the implications that this phenomenon has for the syntactic annotation of sentences in Telugu, an Indian language with agglutinative morphology. We describe in detail, how external sandhi formation in Telugu, if not handled prior to dependency annotation, leads either to loss or misrepresentation of syntactic information in the treebank. This phenomenon, we argue, necessitates the introduction of a sandhi splitting stage in the generic annotation pipeline currently being followed for the treebanking of Indian languages. We identify one type of external sandhi widely occurring in the previous version of the Telugu treebank (version 0.2) and manually split all its instances leading to the development of a new version 0.5. We also conduct an experiment with a statistical parser to empirically verify the usefulness of the changes made to the treebank. Comparing the parsing accuracies obtained on versions 0.2 and 0.5 of the treebank, we observe that splitting even just one type of external sandhi leads to an increase in the overall parsing accuracies.
منابع مشابه
Frequency Effects on French Liaison
A mainstay of the debates concerning the phonology-syntax interface are phenomena of external sandhi, that is, phonological alternations whose conditioning environment is across a word boundary. 1A recurrent problem in this area is the fact that it is usually impossible to motivate a purely syntactic account of such alternations. This has led to the widespread consensus that the relation betwee...
متن کاملThe effect of production planning locality on external sandhi: a study in /t/
1 Introduction External sandhi processes, in which the target of an alternation is in a different word from the trigger of the alternation, differ from word-internal phonological processes in two important ways: they are subject to locality conditions that constrain which two word sequence the process can apply to, and they are more likely to be " optional " or inherently variable. Locality con...
متن کاملSyntactic Annotation in the Columbia Arabic Treebank
Abstract The Columbia Arabic Treebank (CATiB) is a database of syntactic analyses of Arabic sentences. CATiB contrasts with previous approaches to Arabic treebanking in its emphasis on faster production with some constraints on linguistic richness. Two basic ideas inspire the CATiB approach. First, CATiB avoids the annotation of redundant linguistic information that is determinable automaticall...
متن کاملShape Conditions and Phonological Context
Features of modulation and of phonetic modification play a great part in many syntactic constructions; they are known as sandhi. The form of a word or phrase as it is spoken alone is known as its absolute form; the forms which appear in included positions are its sandhi-forms. Thus, in English, the absolute form of the indefinite article is a ["ej]. . . . If the next word begins with a vowel, w...
متن کاملLocality in Phonology and Production Planning∗
This paper explores the idea that certain locality effects on phonologically conditioned allomorphy and external sandhi processes can be explained by the locality of production planning. The first part of this paper presents evidence for this hypothesis based on locality effects in phonological conditions on the choice of allomorph of the affix /ing/ between [in] and [iN]. A second and more spe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Polibits
دوره 43 شماره
صفحات -
تاریخ انتشار 2011